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Abstract 

Bayesian model averaging (BMA) is a statistical method for post-processing forecast 
ensembles of atmospheric variables, obtained from multiple runs of numerical weather 
prediction models, in order to create calibrated predictive probability density functions 
(PDFs). The BMA predictive PDF of the future weather quantity is the mixture of the 
individual PDFs corresponding to the ensemble members and the weights and model 
parameters are estimated using ensemble members and validating observation from a 
given training period. 

In the present paper we introduce a BMA model for calibrating wind speed fore- 
casts, where the components PDFs follow truncated normal distribution with cut-off 
at zero, and apply it to the ALADIN-HUNEPS ensemble of the Hungarian Meteoro- 
logical Service. Three parameter estimation methods are proposed and each of the 
corresponding models outperforms the traditional gamma BMA model both in cali- 
bration and in accuracy of predictions. Moreover, since here the maximum likelihood 
estimation of the parameters does not require numerical optimization, modelling can 
be performed much faster than in case of gamma mixtures. 

Key words: Bayesian model averaging, continuous ranked probability score, ensemble 
calibration, truncated normal distribution. 



1 Introduction 



The most important aim of weather forecasting is to give a robust and reliable prediction of 
the future state of the atmosphere based on observational data, prior forecasts and mathe- 
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matical models describing the dynamical and physical behaviour of the atmosphere. These 
models consists of sets of hydro-thermodynamic non-linear partial differential equations of 
the atmosphere and its coupled systems (like surface or oceans for instance) and have only 
numerical solutions. The difficulty with these numerical weather prediction models is that 
since the atmosphere has a chaotic character the solutions strongly depend on the initial 
conditions and also on other uncertainties related to the numerical weather prediction pro- 
cess. Therefore, the results of such models are never fully accurate. A possible solution 
is to run the model with different initial conditions (since the uncertainties in the initial 
conditions are one of the most important sources of uncertainty) and produce an ensemble 
of forecasts. Using a forecast ensemble one can estimate the probab ility distribution of fu- 
ture weather variables which allows probabilistic weather forecasting (IGneiting and Raftery . 



20051 ). where not only the future atmospheric states are predicted, but al so the related un- 



certainty information. The ensemble predic tion method was proposed by [Le ith (1974) and 
since its first operational implementation (IBuizza et all . Il993t iToth and Kalnavl . 119971 ) it 
has become a widely used technique all over the world and the users understand more and 
more its merits and economic value as well. However, although e.g. the ensemble mean 
on average yields better forecasts of a meteorological quantity than any of the individual 
ensemble me mbers, it is often the case that the ensemble is under-dispersive and in this way, 
uncalibrated (IBuizza et all . 120051 ). so that calibration is needed to account for this deficiency. 

The Bayesian model averaging ( BMA) method for p ost-processing ensembles in order 
to calibrate them was introduced by iRaftery et all (120051 ) . The basic idea of BMA is that 
for each member of the ensemble forecast there is a corresponding conditional probability 
density function (PDF) that can be interpreted as the conditional PDF of the future weather 
quantity provided the considered forecast is the best one. Then the BMA predictive PDF of 
the future weather quantity is the weighted sum of the individual PDFs corresponding to the 
ensemble members and the weights are based on the relative performance of the ensemble 
members during a given training period. In this way B MA is a special, fix ed parameter 
version of dynamic model averaging method developed by lRaftery et all ( 120101 ) . The weight 
parameters and the parameters of the individual PDFs are estimated using linear regression 
and maximum likelihood (ML) method, where the maximum of the likelihood function is 
found by EM algorithm. In practice, the performance of the individual ensemble members 
should have a clear characteristic (and not a random one) or if it is not the ca s e this fact 
should be taken int o account at the calibration process (see e.g. iFralev et all 120101 ) . In 
Rafterv et all ( 120051 ) the BMA method was successfully applied to obtain 48 hour forecasts 
of surface temperature and sea level pressure in the North American Pacifi c Northwest based 
on th e 5 members of the University of Washington Mesoscale Ensemble ( jGrimit and Massl . 



20021 ) . These weather quantities ca n be modeled by norm al distributions, so the predictive 



PDF is a Gaussian mixture. Later, ISloughter et all (120071 ) developed a discrete-continuous 
BMA model for precipitation forecasting, where the discrete part corresponds to the event 
of no precipitation, while the cubic root of the precipitation amount (if it is positive) is 



3 



modeled by a gamma distribution. In ISloughter et al\ (120101 ) the BMA method was used for 
wind speed forecasting and the component P DFs follow gamm a distributions, while using 



von Mises distribution to model angular data. iBao et al 



to predict surface wind direction. Finally, ISloughter et al. 



201oh introduced a BMA scheme 



(120121 ) described a BMA model for 



wind vector forecasting, where wind vectors are modeled using bivariate normal distribution. 

The bivariate normal model for wind vectors is also used in t he ensemble model o utput 
statistics (EMOS) meth od for post-processing ensemble forecasts (ISchuhen et all 120121 ) . The 
EMOS, introduced by iGneiting et all (120051 ) for calibrating ensemble forecasts following 
normal distribution (sea level pressure, temperature), produces a single normal PDF, where 
the mean and the variance depends on the e nsemble members. However, the meth od can be 
extended to truncated normal distribution ( IThorarinsdottir and Gneitingl . l2010l ). too, and 
in this way it can be used for calibrating wind speed data. 

In the present paper we develop a BMA mo del for wind speed forecasting wher e the 
component PDFs, similarly to the EMOS PDF of IThorarinsdottir and Gneitingl (J2010|), fol- 
low truncated normal distribution. The advantage of this model to the gamma model of 
Sloughter et al\ (120101 ) is that for parameter estima tion the truncated data EM algorithm 
for Gaussian mixture models (ILee and Scottl . 120121 1 can be used which works with closed 
formulae both in expectation (E) and in maximization (M) steps. In this way the estimation 
of parameters is much faster, which is a key issue in operational applications. 

We test our model on the ensemble forecasts of wind speed produced by the operational 

the Hungarian Met eoro- 



Limited Area Model Ensemble Prediction System (LAMEPS) o 

logical Service (HMS) called ALADIN-HUNEPS (jHagelL bold iHoranvi et all l201lh and 
compare the results obtained with the co rresponding resu l ts of Bar an et al. ( 2013al ) where 



for calibration the BMA gamma model of ISloughter et al\ (120101 ) was used 



2 Data 



The ALADIN-HUNEPS system of the HMS covers a large part of Continental Europe with 
a horizontal resolution of 12 km and it is obtained by dynamical downscaling (by the AL- 
ADIN limited area model) of the global AR PEGE based PEARP system of Meteo France 
(IHoranyi et all 120061 ; iDescamps et all 120091 ). The ensemble consists of 11 members, 10 ini- 
tialized from perturbed initial conditions and one control member from the unperturbed 
analysis, implying that the ensemble contains groups of exchangeable forecasts. The data 
base contains 11 member ensembles of 42 hour forecasts for 10 meter wind speed (given 
in m/s) for 10 major cities in Hungary (Miskolc, Szombathely, Gyor, Budapest, Debrecen, 
Nyfregyhaza, Nagykanizsa, Pecs, Kecskemet, Szeged) produced by the ALADIN-HUNEPS 
system of the HMS, together with the corresponding validating observations for the period 
between October 1, 2010 and March 25, 2011 (176 days, or 1760 data points). The forecasts 
are initialized at 18 UTC. The data set is fairly complete since there are only two days 
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Figure 1: Verification rank histogram of the 11-member ALADIN-HUNEPS ensemble. Pe- 
riod: October 1, 2010 - March 25, 2011. 



(18.10.2010 and 15.02.2011) where three ensemble members are missing for all sites and one 
day (20.11.2010) when no forecasts are available. 

Figure [1] shows the verification rank histogram of the raw ensemble. This is the histogram 
of ranks of validating observations with respect to the corresponding ensemble forecasts 
computed fr om the ranks at all stations and over the whole verification period (see e.g. 



Wilksl . l201ll . Section 7.7.2). This histogram is far from the desired uniform distribution as 



in many cases the ensemble members either underestimate or overestimate the validating 
observations (the ensemble range contains the observed wind speed in 61.21% of the cases, 
while its nominal value equals 10/12, i.e 83.33%). Hence, the ensemble is under-dispersive 
and in this way it is uncalibrated. Therefore, statistical post-processing is required to improve 
the forecasted probability density function. 



3 Methods 



3.1 Bayesian model averaging 



Let fi, f 2 , ■ ■ • , /m denote the ensemble forecast of a c ertain weather quan tity X for a given 
location and time. In BMA for ensemble forecasting ( iRafterv et all 120051 ) to each ensemble 
member corresponds a component PDF gk(x\fk,0k), where 9k is a parameter to be 
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estimated. The BMA predictive PDF of X is 



M 



p{x\ jfi, . . . , f M ; 6>i, . . . , M ) := ^2,u k 9k{x\ f k ,0 k ), 



(3.1) 



k=l 



where the weight u k is connected to the relative performance of the ensemble member f k 
during the training period. Obviously, the weights form a probability distribution and in 
this way they are nonnegative and Yll=i w fc = 1- 

BMA model (13.1 1) is valid only in the cases when the sources of the ensemble mem- 
bers are clearly disting uishable, as for the University of Washington mesoscale ensemble 
fjEckel and Massl . 120051 ). However, most of the currently used ensemble prediction systems 
produce ensembles where some ensemble members are statistically indistinguishable. Usu- 
ally, these exchangeable ensemble members are obtained with the help of perturbations of the 
initial conditions, which is th e case for the 51 member Europ ean Centre for Medium- Range 
Weather Forecasts ensemble (ILeutbecher and Palmer! . 120081 ) or for the ALADIN-HUNEPS 
ensemble described in Section [2j 

Suppose we have M ensemble members divided into m exchangeable groups, where 
the fcth group contains M k > 1 ensemble members, so Y T!-! M k = M. Furth er, denote 
by fk,£ the £th member of the kth group. For this situation iFralev et al\ ( 120101 ) suggested 
to use model 



p(x\f : 



/i, 



Mi 



m,l) 



m,M m t 



m M k 



#m) '■= £^£^U k g k (x\ ^ k ' £ ' 1 



(3.2) 



k=i 



where ensemble members within a given group have the same weights, PDFs and parameters. 

To simplify notations we give the results and formulae of this section for model (13. ip . 
but their generalization to model (13. 2p is rather straightforward. 



3.2 Truncated normal model 



As it was mentioned in the Introduction, for weather variables such as temperature, pres- 
sure or wind vectors BMA models with normal component PDFs can be fit reasonably well. 
However, for modeling wind speed, which can take only nonnegative values, a skewed dis- 



tribu tion is r equired. A popu l ar can didate is the Weibull distribution (see e.g. lJustus et al. 



19781 ). while ISloughter et all (l2010f) consid e rs a B MA model based on gamma distribution 



Here we follow the ideas of iGneiting et al\ (120061 ) and iThorarinsdottir and Gneitingl (120 10f ) 
and for wind speed modeling we employ a truncated normal distribution with a cut-off at 
zero Af°(n,(x 2 ) with PDF 

g(x\^) := P)/*) ,> and „):=„, otherwise, (3.3) 
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where <p and $ denote the PDF and the cumulative distribution function (CDF) of the 
standard normal distribution, respectively. The mean k and variance g 2 of A/"°(/i, a 2 ) 
are 

9 2 = a 2 ( 1 _ Wit 1 /* 7 ) _ ( ^W ff ) S " 



K = jJL + 



<T(p{p./o) 



and 







(3.4) 



respectively. 

We assume that location /z is a linear function of the forecasted wind speed that leads 
to BMA mixture model 

M 

p(x\ fi, . . . , f M ] ai,..., a M ] Pi,---, Pm] ci, • • • , cm) '■ = 22 Uk 9( x \ a k + Pkfk, ffc), (3-5) 

fc=l 

where g is the PDF defined by f)3.3p . To simplify the model, in what follows we also assume 
<7i = 02 = . . . = aM —: a. This reduces the num ber of paramete r s and makes computations 
easier. Furthermore, the BMA gamma model of ISloughter et al\ (120101 ). used as a reference, 
also operates with this restriction. 



3.3 Continuous ranked probability score 



Continuous ranke d probability score (CRPS) i s the most p opular scoring rule for evaluating 



density forecasts (IGneiting and Raftervl . 
number x, the CRPS is defined as 



2007 



WilkaLl201lh . Given a CDF F(y) and a real 



/OO 1 
(F(y) - l {y > x} ) 2 dy = E\X - x\ - -E\X - X'\ 
-oo ^ 



(3.6) 



where X and X' are independent random variables with CDF F and finite first moment. 
CRPS is a proper scoring rule which is negatively oriented, i.e. the smaller the better, and 
it can be reported in the same unit as the observation. 

Now, if X ~ Af° (fi, cr 2 ) then short calculation shows 



Si(x,h,<j) 



RX-x\ 



A(x-fi, a 2 ) +(x-fi) ($ (n/a) - 1) - a(p (fi/a) 



where A(n, a 2 ) := E\Y\ with Y ~ Af(fi, a 2 ), that is 

A(fx,a 2 ) = n(2$(fi/a) - l) + 2a<p(n/a). 

Further, if X x ~ A/"°(//i, of) and X 2 ~ Af°(fj, 2 , erf) are independent, then the PDF of 
\X!-X 2 \ is 

/, M ,(*)=L^ 

\<TlJ \°"2/ V a d J V a\<*&) \ °d ) \ Cr 2 (T dy 
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where 



Hence, 



■= Hi - A*2, 



CTd 



and Q d 



SsCAilj ^2,0-1,0-2) := E|Xi - X 2 | = $(/Xi/cri)$(/x 2 /cr 2 ) 



x 



where the correction term 
C(/ii,/i 2 ,cri,o- 2 ) := 



r ( x - — ) $ ( —s - 1 + ( a; + — 1 $ ( —x - £ d 



dx 



can only be evaluated numerically. Now, using (13. 6 p one can easily obtain the CRPS corre- 
sponding to the CDF V of mixture model (I3.5p . namely 

M ^ M M 

crps (V,x) = ^u; k Si(x,a k + (3 k f k ,a k ) - - ^^^^^(afc + (3 k f k ,a E + (3 e f e , a k , a e ). 



k=l 



k=l 1=1 



3.4 Parameter estimation 



Parameters a k , (3 k , oo k , k — 1,2, ... M, and a are estimated using training data consisting 
of ensemble members and verification observations from the preceding n days (training 
period). In what follows, f k , s ,t denotes the kth ensemble member for location sGiS and 
time i G T and by x Sjt we denote the corresponding validating observation. We consider 
three approaches which mainly differ in estimation of the parameters a k , (3 k of location. 



3.4.1 Naive approach 



Similarly to the normal BMA model of iRaftery et al\ (j2005h we estimate location parameters 
a k and /3 k with a linear regression of x Sjt on f k , s ,t over the time points in the training 
period. We call this approach naive since it does not take into account that for truncated 
normal distribution A/"°(/i,cr 2 ) location fi does not equal to the mean k. However, 
one should remark that with the increase of /i the correction term in (13. 4p decreases in an 
exponential rate. 

To estimate weights u k , k = 1, 2, ... , M, and scale parameter a maximum likelihood 
method is applied using again training data. Under the assumption of independence of 
forecast errors in space and time the log-likelihood function corresponding to model (13. 5 p 
equals 



s.t 



M 



2j u k g (x S! t I a k + /3hfk,s,ti a ) 



k=l 



(3.7) 
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where the first summation is over all locations s £ S and time points t from the training 
period containing iV terms (N distinct values of (s,t)). 

The log-likelihood function (13. 7p is too complicated to b e maximized analyti cally, so we 
find its maximum using the truncated data E M algorithm oflLee and Scottl (120121). Similarly 



to th e traditional EM algorithm for mixtures ( iDempster et al. 



1977 



McLachlan and Krishnan 



19971 1 we introduce latent indicator variables z^^.t taking values one or zero according as 
whether x 8> t comes from the fcth component PDF or not. The complete data log-likelihood 
corresponding to observations and indicator variables equals 

M 



£c(ui, ■ ■ ■ ,Um] cr) 



s.t k=l 



log(wjfc) + log [g(x Sjt \ o k + Pkfk,s,t, o)j 



(3. 



The EM algorithm alternates between an expectation (E) step and a maximization (M) 
step until convergence. It starts with initial values wj[ , k = 1,2, ...,M, and of 
the parameters. In the E step the latent variables are estimated using the conditional 
expectation of the complete log-likelihood on the observed data, while in the M step the 
parameter estimates are updated by maximizing ic with the current values of the latent 
variables plugged in. 

For the truncated normal mixture model given by ( 13. 3 j) and (13. 5p the E step is, 

u^g(x Sjt \ a k + flkfk,s,t, 



S C7+1) 



(3.9) 



where the superscript refers to the actual iteration. The first part of the M step is obviously 



to 



(f+i) ._ 



s.t ' 



(3.10) 



while the second part can be derived from equation -4 s - 



0. However, in our case this 
equation is nonlinear and since it cannot be solved for a, we suggest iteration step 

M 



a 



2(7+1) 



s.t k=l 



Ok 



Pkfk,s,t) 



(3.11) 



a 



(f) 



M 



N 



s.t k=l 



<p((a k + l3kfk,s,t)/<r U) ) 



Observe that t 



re EM algorithm presented here differs from the corresponding algorithm of 



Raftery et al\ (120051 ) only in the second term of (13.111) . 



3.4.2 Mean corrected approach 



In this approach we assume that the means of the component PDFs are also linear functions 
of form ak + bkfk.s.t of the forecasted wind speed and the linear regression of x Sjt on fk, s ,t 
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is now used to estimate a k and b^, k = 1,2, ...,M. Instead of (13.81) we consider the 
complete data log-likelihood 



M 



£c(ui, • • • , o) = ^2 ^2 Zk > s > 1 



\og{u k ) + log (g(x St t\ lik, B ,ti<r 



s,t fe=l 

where the initial guess for the location parameter fi k ,s,t of the fcth component PDF at (s, t) 

is V%s,t '■= a k + hfk,s,t- 

Now, E step ( 13. 9 p is replaced by 



the first part (I3.10P of the M step remains valid, while 



(3.12) 



u U+i) ._ (o) (;). 



(3.13) 



and 



a 



20+1) = _1 

N 



, M 9 (j) 

1 ff 



M 



/ j 7 j Z k,s,t \ X s,t ^k,s,t 
s.t k=l 



'k,s,t ^k,s,t 



s,t k=l 



(3.14) 



substitute iteration step ( 13. lip . Observe that ( I3.13P is responsible for the mean correction 
in the first term of (13. 4p . Finally, after the EM algorithm stops, parameters a k and /3& 
are estimated with a linear regression of the final value of fik, s ,t on fk, s ,t- 



3.4.3 Full maximum likelihood estimation 

In the previous two cases the estimates of location parameters a k and (3 k , k = 1, 2, . . . , M, 
are obtained separately from the weights and the scale parameter. Here we present a method 
where all parameters are estimated with ML method using the complete data log-likelihood 
defined by (13. 8p . Obviously, in this case £c is considered as a function of a k , (3k, 0Jk, k — 
1,2,..., M, and a. The initial guesses a k and for these parameters are the 

corresponding mean coefficients obtained from regressing of x s>t on fk, s ,t over the time 
points in the training period. In this way the initial value ^ k °lt := a i°^ + Pk fk,s,t of the 
location parameter is exactly the estimated mean of the kth component PDF. 

The E step remains the same as in the mean corrected approach, that is estimate 
of the latent variable Zk )S ,t is given by ( I3.12p . The first part (I3.10p of the M step remains 
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valid again, while equations Jgf and |^ result in iterations 



a 



(i+i) 



s.t 



n -1 



Cj+i) 



k,s,t Jk,s,t 



s.t 



s,t 

-1 



Z k,s,t^ fk 



k,s,t x a t - a 



S.t 



<I, (/';,'>.r^ :/: 



Mean correction (13. 1 3[) takes form 



u (0) 



^((«Sf +1) +^ +1) /M,*)/ 



a 



U) 



(3.15) 



while the estimate of variance is updated using ( I3.14p . 



4 Results 



As was mentioned in the Introduction, the performance of BMA model (I3.5P is tested on the 
ALADIN-HUNEPS ensemble of HMS. We consider all three parameter estimating methods 



of Subsection 13.41 and use the same data base as in lBaran et al\ (j2013al ). w here the authors 
calibr ated the raw ensemble with the help of the BMA gamma model of iSloughter et al. 
(120101 ) considering a training period of 28 calendar days. Here we apply the same training 
period length which allows direct comparison of the two BMA methods. In this way ensemble 
members, validating observations and BMA models are available for 146 calendar days (on 
20.11.2010 all ensemble members are missing). 



4.1 Models and diagnostics 



Using the ideas of iBaran et al\ (I2013al ) we consider two different groupings of ensemble 
members. In the first case we have two exchangeable groups. One contains the control 
denoted by f c while in the other are 10 ensemble members corresponding to the different 
perturbed initial conditions denoted by f P) ±, . . . , / Pl io- This leads us to model 



p(x\ f c f P ,i, • • • , f P ,w; a c , a p ; p c , P P ; a) = cug(x\ a c + /3J C , a) 



(4.1) 



1 - co 10 
1=1 



10 



where uj G [0,1], and g is defined by (13.31) . 
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Figure 2: PIT histograms for BMA post-processed forecasts using two-group 04. ip and three- 
group 04. 2 1) models. 



In the second case the odd and even numbered exchangeable ensemble members form 
two separate groups {/ Pfl , f P;3 , f p>5 , f pJ , / p>9 } and {f Pi2 , f pA , / Pl6 , f P ,s, f P ,io}, re- 
spectively, which idea is justified by the method their initial conditions are generated. To 
get them only five perturbations are calculated and then they are added to (odd numbered 
members) and subtra c ted from (even numb ered members) the unperturbed initial conditions 
( Horanvi et ah . 2011 : Bar an et al . 2013af bl). In this way we obtain the following PDF for 
the forecasted wind speed: 

Q(?\ fa f P ,h • • • , /p,io! a c , ot , a e ; (3 C , p o , (3 e ; a) = u c g(x\ a c + f c (3 c , a) (4.2) 

5 

+ S (vo9(x\<Xo + Paf P ,2t-i,o) +u e g(x\a e + /3 e f Pt 2£,cr)), 
1=1 

where for weights u c , u Q , co e G [0, 1] we have u c + 5u Q + 5u e = 1. 

In order to check the overall performance of the probabilistic forecasts (based on models 
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Truncated normal BMA 


Gamma BMA 




naive 


mean corr. 


full ML 




Two groups 


2.42 x 10" 4 


1.79 x 1(T 2 


0.13 


2.22 x 10~ 2 


Three groups 


1.37 x 10" 4 


5.56 x 1(T 2 


0.18 


1.87 x 10- 2 



Table 1: Significance levels of Kolmogorov-Smirnov tests for uniformity of PIT values cor- 
responding to two- and three-group models. 



(14. ip and (14. 2p ) in terms of a probability distribution function, the mean CRPS (the average 
of the CRPS values of the predictive CDFs and corresponding validating observations taken 
over all locations and time points considered) and the coverage and average widths of 66.7% 
and 90 % central prediction intervals are computed and compared for both BMA methods 
(truncated normal and gamma) and raw ensemble. In the latter case, the ensemble of 
forecasts corresponding to a given location and time is considered as a statistical sample, 
the empirical CDF of the e nsemble replaces th e pred ictive CDF and the sample quantiles 
are calculated according to iHyndman and Fan! (119961 . Definition 7). We remark that the 
coverage of a (1 — a)100%, a G (0,1) central prediction interval is the proportion of 
validating observations located between the lower and upper a/2 quantiles of the predictive 
distribution. For a calibrated predictive PDF this value should be around (1 — a)100%. 
Additionally, the BMA and ensemble medians are considered as point forecasts, which are 
evaluated with the use of mean absolute errors (MAE) and root mean square errors (RMSE). 



4.2 Verification results of BMA post-processing 

To get a first insight about the calibration of BMA post-processed forecasts we consider 
probability integral transform (PIT) histograms. The PIT is the value of the pred ictive cu- 



mulative distribution evaluated at the verifying observations (jRafterv et all 120051 ). which is 
providing a good measure about the possible improvements of the under-dispersive character 
of the raw ensemble. The closer the histogram is to the uniform distribution, the better the 
calibration is. In Figure [2] the PIT histograms corresponding to all three parameter estimat- 
ing methods and to both BMA models (14.11) and (14. 2 p are displayed. A comparison to the 
verification rank histogram of the raw ensemble (see Figure [1]) shows that post-processing 
significantly improves the statistical calibration of the forecasts. Further, on the basis of 
significance levels of Kolmogorov-Smirnov tests given in Table [U for both models (14. ip and 
(14. 2 p one can accept the uniformity of PIT values corresponding to truncated normal BMA 
model with full maximum likelihood parameter estimation method. However, one should 
remark that for the three group model mean corrected parameter estimation also yields 
acceptable PIT values. 

In Table [2] scores for different probabilistic forecasts are given together with the average 
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Forecast 


Mean 
CRPS 


MAE 


RMSE 


Average width 


Coverage (%) 


66.7% 


90.0% 


66.7% 


90.0% 


Two 
groups 


Trunc. 

normal 

BMA 


naive 


0.7225 


1.0631 


1.3800 


2.5738 


4.2850 


67.81 


89.79 


mean c. 


0.7062 


1.0520 


1.3784 


2.6175 


4.3420 


69.86 


90.55 


full ML 


0.7071 


1.0518 


1.3786 


2.6029 


4.3222 


69.38 


90.14 


Gamma BMA 


0.7577 


1.0678 


1.4213 


2.6359 


4.5297 


68.08 


90.34 


Three 
groups 


Trunc. 

normal 

BMA 


naive 


0.7213 


1.0612 


1.3771 


2.5645 


4.2709 


67.26 


89.86 


mean c. 


0.7042 


1.0480 


1.3737 


2.6043 


4.3195 


69.38 


90.34 


full ML 


0.7044 


1.0485 


1.3739 


2.5948 


4.3073 


68.84 


90.14 


Gamma BMA 


0.7556 


1.0643 


1.4153 


2.6153 


4.4931 


68.36 


90.21 


Raw ensemble 


0.8599 


1.1215 


1.4634 


1.4388 


2.2001 


38.70 


55.14 



Table 2: Mean CRPS of probabilistic, MAE and RMSE of median forecasts, average width 
and coverage of 66.7% and 90.0% central prediction intervals. 



width and coverage of 66.7% and 90.0% central prediction intervals. Verification measures 
of probabilistic forecasts and point forecasts calculated using truncated normal BMA models 
(14. ip and (14 .2p are compared to the correspon ding measure s calcul ated for the raw ensemble 



and applying gamma BMA post-processing fjBaran et all l2013al ). Compared to the raw 



ensemble all BMA post-processed forecasts show a significant decrease in all there verification 
scores considered. Further, as the listed CRPS, MAE and RMSE values show, the accuracy 
of the truncated normal BMA probabilistic and point forecasts is better than the accuracy 
of the gamma BMA ones. 

Concerning calibration, one can observe that the coverage of both BMA central prediction 
intervals are rather close to the correct coverage for all models considered, while the coverage 
of the central prediction intervals calculated from the raw ensemble are quite poor. This 
shows that BMA post-processing greatly improves calibration. Further, the truncated normal 
BMA models yields slightly sharper predictions than the gamma BMA forecasts and one can 
also observe that the three-group model slightly outperforms the two-group one. 

Finally, we remark that mean correction step (13.151) in full ML parameter estimation 
method seems essential. Running the algorithm without it (that is fi^l '■= + 
Pk +1 "' ] fk,s,t) e.g. for the three group model yields smaller CRPS (0.7024) but larger MAE 
and RMSE values (1.0499 and 1.3900) and wider central prediction intervals. Moreover, in 
this case the PIT values do not fit the uniform distribution. 
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5 Discussion 



We introduced a new BMA model for post-processing ensemble forecasts of wind speed 
providing a predictive PDF which is a mixture of normal distributions truncated from below 
at zero. The model was tested on the 11 member ALADIN-HUNEPS ensemble of the HMS 
using two different BMA models. One assumes two groups of exchangeable members (control 
and forecasts from perturbed initial conditions), while the other considers three (control and 
forecasts from perturbed initial conditions with positive and negative perturbations). For 
both models a 28 day training period was used and three types of parameter estimation: a 
naive and two more sophisticated one with mean correction and full maximum likelihood 
estimation. The latter resulted PIT values which perfectly fit the uniform distribution both 
for the two- and for the three-group model. The comparison of the raw ensemble and of the 
truncated normal BMA forecasts shows that the mean CRPS values of BMA post-processed 
forecasts are considerably lower than the mean CRPS of the raw ensemble. Furthermore, 
the MAE and RMSE values of BMA median forecasts are also lower than the MAEs and 
RMSEs of the ensemble median. The calibration of BMA forecasts is nearly perfect as the 
coverage of the 66.7% and 90.0% prediction intervals are very close to the nominal levels. 
From the three competing parameter estimation methods the overall performance of the full 
ML estimation seems to be the best. 



Compared to the perfo rmance of gamma B MA model of ISloughter et al\ ( 120101 ) for wind 
speed data investigated in lBaran et al\ ( )2013al ) one can observe that truncated normal BMA 
post-processing yields slightly lower CRPS, MAE and RMSE values and narrower central 
prediction intervals. However, the great advantage of the model presented here appears in 
the speed of parameter estimation. The EM algorithm for estimating the weights and distri- 
bution parameters of the truncated normal BMA model uses closed formulae while in case 
of the gamma BMA model a numerical optimization is required. Running the ensembleBMA 



package of R flFraley et al. 



2009 



20 111 ) on a PC under a 64bit Window 7 operating system 



(Intel Core i5-3470 CPU, 3.40 GHz, 4 cores, 8 Gb RAM) the estimation of parameters of 
the three-group model e.g. for 25.03.2011 using training period of 28 calendar days, for the 
gamma BMA model took 10.23 seconds, while for the truncated normal with naive, mean 
corrected and full ML parameter estimation methods, 6.86 s, 6.83 s and 10.41 s, respectively. 
The difference is more convincing when we perform modeling for all possible 148 calendar 
days. The corresponding running times were 2716.98 s for the gamma BMA and 797.89 s, 
918.16 s and 1245.5 s for the truncated normal BMA models. In this way the truncated 
normal BMA model outperforms the traditional gamma BMA model both in accuracy and 
calibration of forecasts and in computation time. 
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